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Abstract 

Given a DNF formula f on n variables, the two natural size measures are the 
number of terms or size s(/), and the maximum width of a term w{f). It is folklore 
that short DNF formulas can be made narrow. We prove a converse, showing that 
narrow formulas can be sparsified. More precisely, any width it; DNF irrespective of its 
size can be e-approximated by a width w DNF with at most {wlog{l/e))'-'^^^ terms. 

We combine our sparsification result with the work of Luby and Velikovic [LV91, 
LV96] to give a faster deterministic algorithm for approximately counting the number 
of satisfying solutions to a DNF. Given a formula on n variables with poly(n) terms, 
we give a deterministic n^(^"s^os{n)) time algorithm that computes an additive e ap- 
proximation to the fraction of satisfying assignments of / for e = l/poly(logn). The 
previous best result due to Luby and Velickovic from nearly two decades ago had a 
run-time of n'=^p(0(Viogiogn)) [LV91,LV96]. 



*Work done while an intern at Microsft Research, Silicon Valley. 
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1 Introduction 



A natural way to represent a Boolean function / : {0, 1}" — {0, 1} is to write it as a 
CNF or DNF formula. The class of functions that admit compact representations of this 
form (aka polynomial size CNF and DNF formulae) are central to Boolean function analysis, 
computational complexity and machine learning. 

Given a DNF formula f on n variables, the two natural size measures are the number of 
terms or size s{f), and the maximum width of a term w{f). The analogous measures for a 
CNF, are the number of clauses and clause width. It is folklore that every DNF formula / with 
m terms can be e-approximated by another DNF g where s{g) < m and w{g) < log{m/e), 
regardless of w{f). The formula g is a, sparsification of / obtained by simply discarding 
all terms of width larger than log(m/£:). In other words, short DNF formulas can be made 
narrow. An analogous statement can be derived for CNFs. 

In this work, we show the reverse connection: narrow formulae can be made short. 
Indeed, we prove the existence of a strong form of approximation known as sandwiching 
approximations which are important in pseudorandomness. In this work we only consider 
approximators which are also Boolean functions. 

Definition 1.1. Let f : {0, 1}" {0, 1}. We say that functions fu, fe ■ {0, 1}" -> {0, 1} are 

e-sandwiching approximators for f if fe{x) < f{x) < fu{x) for every x G {0, 1}", and 

Pr [feix) ^ fix)] = Pr = 0) A (/(x) = 1)] < e, 

xe{0,l}" x-6{0,l}" 

Pr [fM ^ fix)] = Pr = 1) A (fix) = 0)] < e. 

a;e{0,l}" a;e{0,l}" 

Our main result is the existance of e-sandwiching approximators for arbitrary width w 
DNFs using short width w DNFs where the number of clauses depends only on w and e. 

Theorem 1.1. For every width-w DNF formula f and every e > 0, there exist DNF formulae 
fe,fu each of width w and size at most {w\og{l/e))'^^'^^ which are e-sandwiching approxmi- 
ators for f . 

Our result is proved by a sparsification procedure for DNF formulae which uses the notion 
of quasi-sunflowers due to Rossman [RoslO]. The best previously known result along these 
lines was due to Trevisan [Tre04], who built on previous work by Ajtai and Wigderson 
[AW85]. Trevisan shows that every width w DNF has e-sandwiching approximators that are 
decision trees of depth d = 0{w2'^\og{l/e)). 

A /c-junta is a function which depends only on k variables. We say that f,g : {0, 1}" — )■ 
{0, 1}, we say that g e-approximates / if 

Pr \f(x) ^ g(x)] < e. 

A corollary of our result is the following junta theorem for DNFs. 
Corollary 1.2. Every width-w DNF formula is e- approximated by a {w log(l/£))^("')-jTOta. 
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A similar but incomparable statement can be derived from Friedgut's junta theorem 
[Fri98]. It is easy to see that width w DNFs have average sensitivity at most 2w ^, so by 
Friedgut's theorem any width w DNF is e-close to a 2'^^"'/'^)-junta. Friedgut's result gives 
better dependence on w, whereas we achieve much better dependence on e. Friedgut's 
approximator is not a priori a small- width DNF, and one does not get sandwiching approx- 
imations. Trevisan's result implies that any width w DNF is e-approximated by a fc-junta 
for k = exp(0(M;2'"log(l/e))) [Tre04]. 

Theorem 1.1 has interesting consequences for other parameter settings. One example is 
the following: 

Corollary 1.3. Every width-0 {log n) DNF formula on n variables is n~^^^^ close to a DNF 
of width 0{\ogn) and size riO{iogiog(n)) _ 

In Section 6, we conjecture that a better bound should be possible in Theorem 1.1, which 
is singly exponential in w. If true, this conjecture will give better bounds for both Corollaries 
1.2 and 1.3. 

1.1 DNF Counting and Pseduorandom Generators 

The problem of estimating the number of satisfying solutions to CNF and DNF formulae 
is closely tied to the problem of designing pseudorandom generators for such formulae with 
short seed- length. These problems have been studied extensively [KL83,AW85,NW94,Nis91, 
LV91, LV96, LVW93, Tre04, Baz09, Raz09, DETTIO]. 
For a formula /, let 

Bias(/)= Pr [/(x) = l]. 
xe{o,i}" 

Given a formula / from a class J-" of functions, the goal of a counting algorithm for the class 
J-" is to compute Bias(/). We refer to the counting problems for CNFs and DNFs as t^CNF and 
T^DNF respectively. The problem of computing Bias(/) exactly is T^P-hard [Val79], hence 
we look to approximate Bias(/). 

An algorithm gives an e-additive approximation for Bias(/) if its output is in the range 
[Bias(/)— £, Bias(/)+£:]. It is easy to see that additive approximations for CNFs and DNFs are 
equivalent. There is a trivial solution based on random sampling, but finding a deterministic 
polynomial time algorithm has proved challenging. 

Computing multiplicative approximations to Bias(/) is harder, and here the complexi- 
ties of #CNF and #DNF are very different. An algorithm is said to be a c-approximation 
algorithm if its output lies in the range [Bias(/), cBias(/)]. It is easy to see that obtain- 
ing a multliplicative approximation for t^CNF is NP-hard. Karp and Luby gave the first 
multiplicative approximation for ^^DNF, their algorithm is randomized [KL83]. There is a 
reduction between additive and multiplicative approximations for t^DNF: for DNF formu- 
lae with m terms, the problem of computing a (1 + £:)-multiplicative approximation can be 

^ [Amall] shows a sharp bound of w 
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reduced deterministically to the problem of computing an (e/m)-additive approximation to 
#DNF. This reduction is stated exphcitly in [LV96], where is attributed to [KL83,KLM89] 

Derandomizing thes Karp-Luby algorithm is an important problem in derandomization 
that has received a lot of attention starting form the work of Ajtai and Wigderson [AW85, 
LN90,LV91,LVW93,LV96,Tre04]. The best previous result is due to Luby and Velickovic 
[LV91,LV96] from nearly two decades ago: they gave a deterministic TT.'^xpCoCViogiog")) time 
algorithm that can compute an e-additive approximation for any fixed constant e. 

A natural approach to this problem is to design pseudorandom generators (PRCs) with 
small seeds that can e fool depth two circuits. This problem and its generalization to constant 
depth circuits are central problems in pseudorandomness [AW85,NW94,Nis91,LV96,LVW93, 
Tre04, Baz09, Raz09, BralO, DETTIO]. 

Definition 1.4. A generator G : {0, 1}'' — {0, 1}" 6-fools a class T of functions if 

Pr - Bias(/) <6 

ye{o,iy 

for all f & J-'. The genrator is said to be explicit if G is computable in time polynomial in r 
and n. 

A generator with seed- length r that e- fools DNFs with m clauses gives an e-additive 
approximation for Bias(/) in poly(m, 2'") time by enumerating over all seeds. Such an 
algorithm only requires black-box access to /. The reduction form [KL83, KLM89] implies 
that an optimal pseduorandom generator for DNFs with seedlength 0{\og{mn/ e)) will give 
a deterministic multiplicative approximation algorithm for t^DNF. However, the best known 
generator currently due to De, Etesami, Trevisan and Tulsiani [DETTIO] requires seed length 
0{{\og{mn/ eY). The Luby-Velikovic algorithm is a not a black-box algorithm, but PRCs for 
small-width DNFs are an important ingredient. 

Our Results 

We use our sparsification lemma to give a better PRG for the class of width w DNF formulae 
on n variables, which we denote by DNF(w,n). ^ 

Theorem 1.5. For all 6, there exists an explicit generator G : {0, 1}^' — t- {0, 1}" that 6-fools 
DNF(tL',n) and has seed-length 

r = O -|- wlog ^-^ + loglog(n^ 

In comparison, Luby and Velickovic [LV96] give a PRG with seed-length 0(2"' -|- log log n) 
for fooling width w DNFs. Note that for w = O(loglogn) and S constant, the seed- 
length of the our generator is 0((loglogn)^), whereas Luby and Velickovic need seed-length 
0{log'^^^^ n). For w = loglog(n) and 6 > l/poly(n), our seed-length is still O(logn). 



^The 0{) notation is used to hide terms that are logarithmic in the arguments. 
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The improved generator for small-width DNFs is obtained by using our sparsification 
result to reduce fooling width w DNFs with an arbitrary number of terms to fooling width w 
DNFs with 2'^^"') terms. We then apply recent results by De et al. on fooling DNF formulas 
using small-bias spaces. The fact that our sparsification gives sandwiching approximators is 
critical for this result. 

The Luby-Velickovic counting algorithm can be viewed as a (non black-box) reduction 
from fooling DNFs of size poly(?7,) to fooling DNFs of smaller width. Given Theorem 1.5, 
we can improve and simplify their analysis to get a faster deterministic counting algorithm. 
This is the first progress on this well-studied problem in nearly two decades. In addition, we 
can allow for smaller values of e. 

Theorem 1.6. There is a deterministic algorithm which when given a DNF formula on n 
variables of size m as input, returns an 0{e)-additive approximation to Bias(/) in time 

^ mn ~j 0(log log{n)+log log(m)+log{l/e)) 

For m < poly(n) and e > l/poly(logn), the running time is 0(n*^*-'°s^°^^"-''') . 

Hastad's celebrated Switching Lemma [Has86] is a powerful tool in proving lower bounds 
for small-depth circuits. It also has applications in computational learning [LMN93,Man95] 
and PRG constructions [AW85, GMR"'"12]. As an additional application of our sparsification 
result, we give a partial derandomization of the switching lemma. The parameters we obtain 
are close to that of the previous best results due to Ajtai and Wigderson [AW85] and perhaps 
more importantly, our argument is conceptually simpler, involving iterative applications of 
our sparsification result and a naive union bound. We defer the details to Section 5. 

2 DNF Sparsification 

We will consider DNF formulas that are specified as / = Vj^^Tj where the representation is 
minimal in the following sense: 

• Each Ti is non-constant. Hence each term is non-empty (else we replace it by 1), and 
does not contain a variable and its negation (else we replace it by 0). This guarantess 
that Pr^[Ti = 1] < 1/2. 

• Each that Tj is not implied by some other Tj] if this is so, we can simply drop Tj from 
the definition of /. This means that when viewed as a set of literals, Tj ^ Tj. A 
consequence is that Tj fl Tj C Tj. 

If some stage of our sparsification produces a representation which is not minimal, we 
can convert it to a minimal represntation without increasing the number of terms. 
We call a DNF / unate if it does not contain a variable and its negation. 
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2.1 Sparsification using Sunflowers 



We will first show the following weaker version of Theorem 1.1 with a bound of [wT^ \n{m/e))^ , 
and assumes that / is unate. The proof will illustrate the key ideas behind our sprsification 
procedure. 

Theorem 2.1. For every unate DNF formula f with width w and size m every e > 0, 
there exist DNF formulae fe,fu each with width w and at most {wlog{m/e))^^''"^ which are 
e-sandwiching approxmiators for f . 

The starting point of our sparsification result is the Erdos-Rado Sunflower Lemma [ER60]. 

Definition 2.1. Let k > 3. A collection of subsets Si, . . . , Sk ^ [n] is a sunflower with core 
YifYC. Si for all i and Si (1 Sj = Y for all i ^ j ■ The sets Si\Y are called the petals. 

The set systems that we consider will arise from the terms in some minimal representation 
of a monotone DNF. This will ensure that the petals are always non-empty, although the 
core might be empty. 

The celebrated Erdos-Rado Sunflower Lemma guarantees that every sufficiently large set 
system of bounded size sets contains large sunflowers. 

Theorem 2.2. (Sunflower Lemma, [ER60]) Let T = {Si, . . . , Sm} be a collection of subsets 
of [n], each of cardinality at most w. If m > w\{k — 1)"^ , then T has a sunflower of size k. 

The lemma and its variants have found several applications in complexity theory, we refer 
the reader to [JukOl, Chapter 7] for more details. We will use it to prove Theorem 2.1. 

Proof. (Proof of Theorem 2.1.) Fix a unate, width w DNF / = Ti V T2 V ■ ■ ■ V Tm and for 
simplicity suppose that / is monotone. Since / is monotone, we can think of each term Ti 
as a set of variables of size at most w. Set k = 2'"ln(m/e). Provided 

m> (^w;2"'ln(^))'" >u;!(A;-l)'" (2.1) 

the Sunflower Lemma guarantees the existance of a collection of terms Tj^ , . . . , Tj^, with a 
core Y = H^^^Ti. and disjoint petals Ti. \ Y. Hence we can write 

vJ^iT,^, =YA [yU^T^j \Y))=YAg where g = vti(T,^, \ Y). 

Note that g is a. read-once DNF of width w and size k = 2^ln{m/e), so it is almost surely 
satisfied by a random assignment: 

Pr[jW = Ol = nPr[T,\r = Ol<(l-i)'<-i. 

i=l ^ ' 

The first inequality holds because each Ti^ \ y is a term with width at most w, and the 
second by our choice of k. 
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Thus a natural way to get an upper sandwiching approximation is to replace g{x) by the 
constant 1, which is equivalent to replacing V^^^Tj^. with Y. Let /' : {0, l}" — )■ {0, 1} be the 
DNF formula obtained by this replacement. It is clear that /(x) < f'{x). Further, 

Pr[/(x) = 0/'(x) = 1] < Pr[g{x) = 0] < -. 

Finally, we have s(/') < — {k — 1). 

We can now iteratively apply the above argument as long as the number of terms is 
larger than the bound in Equation (2.1). In each iteration we reduce by A; — 1. Thus, 
we repeat the process at most m/{k — 1) times, obtaining an upper approximating formula 
/„ where 

/(x)< Va;G{0,ir, 

s{fu)<{w2'^\n (^)Y . 

We next describe the construction of the lower approximating formula Z^. We start with 
the sunflower Tj^, ■ ■ ■ ,Tjj, with core Y . Now consider the formula /" obtained from / by 
dropping one of the terms, say Ti^. Then, f"{x) < f{x). Further, the two of them differ 
only if f"{x) = and f{x) = 1, which happens if Tj^ = 1 whereas Tj . = for j G {2, . . . , k}. 
Hence we can bound this probability by 

Pr[r(x) ^ fix)] = Pr[T,^ = 1] ■ Pr[(vJ=2^,J = 0^^ 

X X X-' 

= I Pr[(V?.,T, \ F) = 0] = i (l 

where the second inequality holds since by the sunflower property, conditioning on Tj^ = 1 
fixes the core Y = 1, but does not affect the other petals. Note that s{f") < s{f) — 1. We 
now iterate this step no more than m times to obtain a formula fi where 

Mx)<f{x)Wxe{o,ir, 

Pr[Mx)y^f{x)]<m--=e, 
X m 

sifu)< {w2^\n(^j)y. 

□ 

Theorem 2.1 is weaker than Theorem 1.1 in the assumption of unateness, the dependence 
on m and the dependence on w. We briefly sketch how one can handle the first two issues. 

1. Unateness. One can remove this assumption by using Lemma 2.7 which guarantees 
that any DNF formula contains a large sub-formula which is unate. The resulting 
statement already suffices for Corollary 1.3, since any width log(n) DNF can have at 
most n'^^^°s^'^^^ many clauses. 



1] 



fc-i 



< 



m 
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2. Dependence on m. The size of the approximators depends logarithmically on m. 
One can avoid this by observing that when the formula size is large, the error resulting 
from each step of the sparsification is tiny. One can use this argument to get a size 
bound of (2*" ln(l/e))'^'^"'^ which is independent of m. 

3. Dependence on w. The final bound is exponential in w"^ rather than w. This comes 
from the {k — 1)^ term in the Sunflower Lemma, which we apply for k = T" . The 
question of whether the w\ term in the Sunflower Lemma is necessary is a well-known 
open problem in combinatorics. But there is a lower bound of [k — Vf" [JukOl]. So 
even if the lower bound were to be right answer, it does not (directly) imply a better 
bound for Theorem 2.1. 

2.2 Sparsification using Quasi-Sunfiowers. 

The main property of the sunflower system we used in Theorem 2.1 is that the formula g 
on the petals is highly biased towards 1. As shown by Rossman [RoslO], one can guarantee 
the existence of such "quasi-sunflower" systems satisfying this weaker property, even when 
the number of terms is much smaller than in the usual sunflower lemma. We adapt our 
argument to use quasi- sunflowers instead of sunflowers, to obtain Theorem 1.1. 
We shall use the notion of quasi-sunflower due to Rossman [RoslO]. 

Definition 2.2. (Quasi- Sunflowers, [RoslO]) A unate DNF formula h = \/f^iTi where k >2 
is a 'J -quasi- sunflower with core Y = n^^iT^, and petals {Tj \ Y}^^^ if 



Quasi-sunflowers extend the notion of a sunflower in the sense that even though the 
"petals" (Tj, \ Y) are not necessarily disjoint, the probability that none of them is satisfied 
is small. We disallow k = 1, since otherwise every term is trivially a quasi-sunflower. Since 
we insist that no term of a DNF is contained in another, the petals are non-empty. Hence 
each petal is satified with probability at most 1/2, so every 7-sunflower has k = ^(7) petals. 

Lemma 2.3. (Quasi- Sunflower Lemma, [RoslO]) Any unate width w DNF formula with m 
terms contains a '-/{m)- quasi- sunflower where 



Rossman states the result in the language of set systems, which we have rephrased in the 
language of DNFs. We show the equivalence of the two in the appendix. 

The following lemma will be used to analyze a single step of our sparsification. 

Lemma 2.4. Let g = ^Zi^i be a unate DNF. Then 



Pr[\/l{T,\Y) = 1] >l-e-\ 




(2.2) 



Pr[(Ti = 1) A ((vt^T.) = 0)] < Pr[(vtiT,) = 0]. 
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Proof. Without loss of generality suppose that g is monotone. Since every term in g is also 
monotone, Kleitman's lemma [ASll, Chapter 6] implies that 

Pr[(Ti = 0) A ((vt2T.) = 0)] > Pj[T, = 0] ■ Pr[(vt2T,) = 0] 
Pr[(Ti = 1) A ((vt^T.) = 0)] < Pr[Ti = 1] ■ Pr[{\/l,T,) = 0] 

X XX 

Hence we have 

Pr.KTi = 0) A ((vj^T.) = 0)] ^ _ ^ Pr^[(T, = 1) A {{\/l,T,) = 0)] 

P^^lT^^O^ - 7^^^-^'^^ - "J ^ Pr.[Ti = 1] • 

But this implies that 

Pr[m = 1) A ((vt^T.) = 0)] < Pr[(vi,T,) = 0] ■ ^f^^ < Pr[(vt,T,) = 0] 

X X r rj[i 1 = UJ X 

where the last inequality follows because for any (non-empty) term T, 

Pr[T =!]<-< Pr[T = 0]. (2.3) 

X 2, ^ 

□ 

The only property of Ti that we use is that Prj.[Ti = 1] < Prj:[Ti = 0]. Indeed, we can 
drop any set of terms {Ti}i^s which satisfies Pr^.[Vjg5'Tj = 1] < Pr^lV ii^sTi = 0]. 

The following is our key technical lemma. It applies to unate formulae and allows us to 
reduce the size of formula by (at least) 1. 

Lemma 2.5. For every unate width-w DNF formula g of size m, there exist width-w DNF 
formulae gi, gu each of size at most m — 1 that are e~'^^"^^ sandwiching approximators for g. 

Proof. Let g = V™,]^Tj. Lemma 2.3 guarantees the existance of a 7(m)-quasi-sunfiower 
h = y^^iTi- where 7(m) is given by Equation (2.2). Letting p{x) = V^L^l^ij \ ^) be the 
formula on the petals, we have Prx[p{x) = 0] < e~"'^'^\ We can write 

hix) = y]^,T,^ = FA(v,ti(T.,\F)) = FAp(x) 

We get an upper sandwiching DNF formula g.^ '■ {0, 1}"^ — )■ {0, 1} from g{x) by replacing 
p{x) by the constant 1, which is equivalent to replacing h{x) with the core Y . It is clear that 

g{.x) < guix), s{gu) < s{g) - {k - 1) < s{g) - 1. 

Further, 

Pr[g{x) ^ g^{x)\ = Pr[{g{x) = 0) A {g^ix) = 1)] 

X X 

< Pr\p{x) = 0] 

X 

< e-7(m)_ 
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We now construct the lower sandwiching approximation. Let g£ be the formula obtained 
from g by dropping the term Tj^. Then, it is clear that 

gi{x) < g{x), s{ge) < s{g) - 1. 

Further, 

P'-[(7(x) ^ geix)] = Pr[gix) = 1 A g,{x) = 0] 

X X 

< pjm, \Y) = i)A (vj'^^m. \ y)) = 0] 

< Pr[p{x) = 0] (By Lemma 2.4) 

X 

□ 

One can prove Theorem 1.1 for unate DNFs by repeated applications of this Lemma. 
To handle the general case, we use the following simple lemmas to reduce the problem of 
constructing sandwiching approximations to the unate case. 

Lemma 2.6. Let f,g,h: {0, 1}" — )■ {0, 1} be such that f = g\/ h. Let gi, g^ be e- sandwiching 
approximators for g. Then g^M h and gu^/ h are e-sandwiching approximators for f . 

Proof. It is easy to see that for every x G {0, 1}", 

gi{x) V h{x) < g{x) V h{x) < gu{x) V h{x). 

We bound the approximation error for g^ V /i, the proof for gu\/ h is similar. 

Pr[{g,{x) V h{x)) ^ {g{x) V h{x))\ = Pr[(^,(x) V h{x) = 0) A {g{x) V h{x) = 1)] 

X X 

= Pr[{ge{x) = 0) A {g{x) = 1) A {h{x) = 0)] 

X 

< Pr[igeix) = 0) A (gix) = 1)] 

X 

< e. 

□ 

Lemma 2.7. For every width u; DNF / = V^^^^Tj of size m, there exists S C [m] where 
\S\ > m/2''" such that the formula g = M j^sTi^ is unate. 

Proof. Pick a random set of literals S as follows: for each of the variables Xi add one of Xi 
or Xi to S uniformly at random. Let gs be the sub-formula of / formed of terms containing 
only literals from S. Then, gs is always unate. 

Each term has at least a chance of being in gs- By linearity of expectation 

□ 
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We will use the following asymptotic bound whose proof is a calculation and is deferred 
to the appendix. 

Fact 2.8. For 7 : M+ M+ defined by Equation 2.2, W = {2wf'"{50\og{l/e))'", and 
e < 1/4, 

m 

j=W+l 

We can now prove Theorem 1.1: 

Proof. Let / = V^^^Tj. By applying Lemma 2.7, we can write f = g V h where g is unate 
and has m' > terms. By Lemma 2.5, there exist sandwiching approximators g£,gu 

each of width w and size at most m' — 1, whose error is bounded by 

g-7(m') < g-7(m/2»)_ 

By Lemma 2.6, = g^M h and fu=gu^h are e"'*'^™'^ sandwiching approximations for /. 
Further 

sUl) = sige) + sih) < sig) - 1 + sih) < .(/) - 1 

and similarly < — 1. 

We iterate this construction separately for the upper and lower approximator till the size 
of the formulae drops below W. This gives the sequence 

f{x)<fl{x)---<f^-{x) := r{x) 
f{x)>fl{x)--->f,^{x) :=feix) 

where s{fe), s{fu) < W. We can bound the error of these approximators by 

m 

e-7(i/2») < ^. (2.4) 

j=W+l 

where the inequality is from Fact 2.8. This completes the proof of Theorem 1.1. □ 



3 Fooling Small- Width DNFs 

We next use our sparsification result to construct a pseudorandom generator for small-width 
DNFs, obtaining an exponential improvement in terms of the width over the generator of 
Luby and Velickovic [LV96]. We restate Theorem 1.5 with the exact asymptotics for r. 

Theorem 3.1. For all 6, there exists an explicit generator G : {0, 1}'' — t- {0, 1}" that 6-fools 
all width w DNFs and has seed-length 

r = O (w^ log^(u7) + w log{w) log ( 7 ) + loglog(n) J 
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We prove the theorem as follows: we first use our sparsification result to reduce the case 
of fooling width w DNFs with an arbitrary number of terms to that of fooling width w DNFs 
with 2'^^"') terms and then apply the recent results due to De et al. [DETTIO] showing that 
small-bias spaces fool DNFs with few terms. 

Definition 3.2 (/c-wise e-biased spaces). A distribution D over {0,1}" is said to he {k,e)- 
biased space if for every non-empty subset I C [n] of size at most k, 



Pr [®^eIX. = 1] - ^ 



< e. 



Naor and Naor [NN93] constructed explicit (fc, £:)-biased spaces that require only 0{k + 
log(l/e) + log log n) bits to sample from. 

Next, we need the following result of De et al. [DETTIO] showing that (k, £:)-biased spaces 
fool DNFs for suitable choices of k and e. 

Theorem 3.3. [DETTIO, Theorem 4-i] Tor every 6 > 0, every DNF with width w and size 
m is 6-fooled by {k,e)-biased distributions for 

k = O {^wlog y-j 



log = O (w\0g{w) log (^y 



De et al. prove the above statement only for the case oik = n, and they use the bound w < 
log{m/6). Their proof proceeds by constructing small £i-norm sandwiching approximators. 
The above statement is obtained by repeating their proof keeping w and m separate, and 
bounding both the degree and the ii norm of the resulting approximators. It is easy to 
see from their proof that the approximators have degree k < 0{wlog{m/6)) and £i-norm 
bounded (m/5)°("'^°g("')). 

We use the fact that to fool a class of functions, it suffices to fool sandwiching approxi- 
mators [BGGP07,Baz09]. 

Fact 3.4. Let J-", ^ be classes of functions such that every f E has e-sandwiching approx- 
imators in Q . Let G : {0, l}'^' — {0, 1}" be a pseudorandom generator that e-fools Q . Then 
G {e + 6) -fools 7. 

We are now ready to prove the main result of this section. 

Proof of Theorem 3.1. Recall that DNF(w,n) denotes the class of all width w DNF s on n 
variables. Let Q C DNF(w,n) denote the subset of all formulae with size at most m = 
{w\og{l/ 5)Y'^ for some sufficiently large constant c. By Theorem 1.1, every / G DNF(w,n) 
can has (5-sandwiching approximators in Q. 

Next, we apply Theorem 3.3 with m = {w log{l / S)Y^ . Note that 



O ( w log(w) + log ( ^ 
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So we conclude that {k, e)-biased distributions 5-fool Q where 



k = O [w \og{w) + w\og 



log ^-^ = O log^ w + w log{w) log ^- 
Note that we can sample from such a distribution using a seed of length 
r = O ^k + log ^-^ + loglog(n^ 
= O log^(ty) + w log(ty) log ^-^ + loglog(?7,) 
Finally, by Fact 3.4, such distributions 26 fool the class DNF(w,n). □ 

4 Deterministic Counting for DNFs 

We now use the PRG for small- width DNFs from the previous section in the Luby-Velickovic 
counting algorithm [LV96]. The better seed- length means that we do not need to balance 
various parameters as carefully, and can redo their arguments with simpler and better settings 
of parameters. 

The input to our algorithm is a DNF formula / = V^^Tj on n variables with size m and 
width w, and the output is an e-additive approximation to Bias(/). We set the following 
parameters 



:= log [-) : ^ •= ^ =6k,d = - 



Let H = {h : [n] ^ [t]} be a family of /c-wise independent hash functions. Fix a hash 
function h El-L and let Bj = {i : h{i) = j}. We say the term Tj bad for h if 

max \Bi n Tjl > w' 

where we view Ti as a set of variables. Let fh be the formula obtained from / by dropping 
all terms that are bad for h. 

Let G : {0, 1}'' — {0, 1}" be the generator from Theorem 1.5 that fools DNF{w',n) with 
error at most 6. Define a new generator Gh '■ ({0, 1}'')* — )■ {0, 1}"^ as follows: 

Gh{zi, ...,zt) = x, where for j e [t], = G{zj). (4.1) 

Thus Gh applies an independent copy of G to each bucket defined by the hash function h. 
We now state the counting algorithm: 
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Algorithm DNFCount 




For each h ^Ti, 




Drop all bad terms for h from / to obtain f^. 




By enumeration over all ^€{0,1}''*, compute 






(4.2) 


Return = raaxh^n Ph ■ 





We need the following lemma about /c-wise independent hash functions. 

Lemma 4.1. LefH : [n] — ?■ \t] he a k-wise independent family of hash functions. Then, for 
every set S C [n] of size \S\ < kt, and every j G [t], 



Pr [\h~\j)r\S\ > 6A:] < 2 

Proof. Fix i G {t\. Let 5" = {1, . . . , kt} without loss of generality. Let {Xi}^!^^ be indicator 
random variables that are 1 if h{i) = j and otherwise. Then 



E 

hen 



Applying Markov's inequality, 



/CS,|/|=fc is/ 



6*= 



PrJ\h^\j)nS\>6k]<-^<2-\ 



\ k / 

□ 

Our analysis requires two Lemmas from [LV96]. Since their terminology and notation 
differs from ours, we provide proofs of both these Lemmas in Appendix B. 
The first Lemma relates the bias of fh with that of /. 

Lemma 4.2. [LV96, Lemma 11] We have 



Wh e n, Bias(A) < Bias(/), 
EiBias(/,)] > Bias(A)-£. 



hen 



The next lemma showing that Gh fools the formula fh is essentially [LV96, Lemma 7]. 
Recall that by Equation (4.2), ph is the bias of fh under distribution generated by Gh- 

Lemma 4.3. [LV96, Lemma 7j We have \ph — Bias(//j)| < e. 
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With these Lemmas in hand, we now analyze the algorithm. 

Theorem 4.4. Algorithm DNFCount when given a DNF on n variables with width w and 
size m as input, returns an 0{e) -additive approximation to Bias(/) in time 

Q^^O(log(i«/e))Q|^g^^O(u>)20(u>log(l/e))^^_ 

Proof. The correctness of the algorithm is easy to argue. For every /i G "H, 

Ph < Bias(/h) + e (By Lemma 4.3) 
< Bias(/) + e (By Lemma 4.2) 

Further by Lemma 4.2, there exists /i G "H such that 

Bias(A) > Bias(/) - e, 

hence by Lemma 4.3, 

Bias(A)-£> Bias(/)-2£. 

Thus p-}{ is a 2£-additive approximation Bias(/). 

We now bound the running time. Computing fh for any h E li. and evaluating it on 
Gh{z) for z G {0, l}*"* can be done in time 0{mn). Thus the running time is dominated by 
I HI 2'"*. By standard constructions of /c-wise independent hash functions. 

Next we bound the seed- length r. Recall that 

t = log(f).i 
Hence log ^-^ = log (^~^^ = k — log(A;). 
Further, w' = 6k. Hence Theorem 3.1, 

r = O ^'"^ \og^{w') + w' log(u;') log ^-^ + loglog(r2' 

= 0(A:Mog2(fc) +loglog(n)) 
rt = {^{k'^\og^{k) + loglog(n))j 
= 0{wk log^ k + w loglog(r2)). 

|7/|2'"* < exp{0{k\og{n) + wklog^ k + wloglog(n))). 



w\ e ke 

- ,c) = T = — 

e J t w 



So we get 
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Overall the runtime is bounded by 

0{mn)\'H\2''^ = exp(0(log(w/£:) log(n) + w log{w / e) {\og\og{w / e)Y + wloglog(n) + log(m))) 

^ ^0{log{w/e)) ^-^0{w)20iw log(l/£))^_ 

□ 

Theorem 1.6 is obtained from Theorem 4.4 by setting parameters appropriately. 

Proof. (Proof of Theorem 1.6.) Given a DNF formula with size m, we can ignore all terms 
of width larger than log(m/£:) while only changing the bias by e. Plugging in w = log(m/e), 
we can bound the running time by 

^ mn ~j 0(log log(n)+log log(m)+log(l/e)) 

For m = poly(?7,),£: = l/poly(logr2), this gives □ 



5 A Derandomized Switching Lemma 

Hastad's celebrated Switching Lemma [Has86] is a powerful tool in proving lower bounds for 
small-depth circuits. It also has applications in computational learning [LMN93,Man95] and 
PRG constructions [AW85,GMR"'"12]. This lemma builds on earlier work due to Ajtai [Ajt83], 
Furst, Saxe and Sipser [FSS84] and Yao [Yao85]. 

To state the Switching lemma, we need to set up some notation. We start with some 
notation. Given L C [n] and x E {0, l}'"^^'^ define a restriction p := ^ E {*,0, 1}" by 
Pi = * if i E L and pi = Xi otherwise. We call the set L = L{p) as the set of "live" variables. 
For / : {0, 1}" ^ {0, 1}, and p E {*, 0, 1}", define /, : {0, 1}^(^) ^ {0, 1} by f,iy) = /(x), 
where Xi = Ui for i E L{p) and Xi = pi otherwise. 

Given a distribution V on 2'"', let V (abusing notation, the meaning will be clear from 
context) denote the distribution on p G {*,0, 1}" by setting p = pi^^ where L ^ V and 
X Gu {0, l}'"]^'^. Call a distribution V as above p-regular if for each i E [n], Pr L^Ti[i E L] = p. 
Let T>p{n) (we omit n if clear from context) denote the p- regular distribution on subsets L 
of [n] where each element i E [n] is present in L independently with probability p. For 
/ : {0, 1}"" — {0, 1}, let DT(/) denote the minimum depth of a decision tree computing /. 

Theorem 5.1 (Switching Lemma, [Has86]). Let f : {0,1}" — j- {0,1} be a DNF of width w 
and let p ^ Vp{n). Then, 

Pr[DT(/p) > s] < {5pwy. 

There has been work on finding a derandomized version of the switching lemma, moti- 
vated by better PRG constructions . Such a lemma would choose the set of live variables 
in a pseudorandom way, as in [AW85]. One could even ask for a stronger derandomization 
where the assignments to the non-live variables are also chosen pseudorandomly, this is done 
in [GMR"'"12]. We limit ourselves to the former case here. 
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Derandomized switching lemmas were first studied in the seminal work of Ajtai and 
Wigderson [AW85], with the aim of constructing better PRCs for constant depth circuits. 

Theorem 5.2 ( [AW85]). For all 7 G (0, 1], p < , there is a p-regular distribution T) on 
2'"] with L T) samplable using O^(logn) random bits, and k = 0^(1) such that for p ^ T), 
and any polynomial size DNF /, 

Pr[/p is not a k-junta] < l/poly(n). 

A very recent result along these lines is due to the authors together with Trevisan and 
Vadhan, which gives a near-optimal derandomization in the special case of read-once DNFs 
[GMR"'"12]. They use this to give near PRCs for read-once DNFs with seed-length O(logn). 

We remark that if instead of finding a small set of restrictions that work for all formulas 
/, we are given the formula / as input, Agrawal et al. [AAI+01] give a polynomial-time 
algorithm to find a restriction that simplifies the formula as well as the bounds given by the 
switching lemma Theorem 5.1. 

5.1 Our Result 

We give a different argument that essentially recovers the result of Ajtai and Wigderson and 
further gives a trade-off between the survival probability p, the complexity of the restricted 
function and the failure probability of the restriction. Our argument is through repeated 
applications of Theorem 1.1 and it seems to us to be simpler than those of Hastad [Has86] 
and Ajtai and Wigderson [AW85]. 

Theorem 5.3. There exists a constant C such that for any w,s,S > and all p such that 

S 

^ - (i/;log(l/£))^i°s"'' 

there is a p-regular distribution T> on 2'"] that can be sampled efficiently using r random bits 
where 

r = r(n, s, e,S) = O ((log w) ■ (\ogn -\- slog(l/5)) + w \og{w log(l/£:))) , 

the indicator events l{i G L} are p-biased and the following holds: for any width w DNF 
/ : {0,1}" ^ {0,1}, and p^V, 

Pr[/p does not have e-sandwiching approximations in DNF(s,n)] < e + 5**^^ 

In particular, by setting 5 = 1/rC', s = 0(1/7), e = l/poly(?7,), w = O(logn), we 
almost recover the derandomized switching lemma of Ajtai and Wigderson, with the main 
difference being that we need 0((log?7,)(loglog ra)) bits to sample from V and we only get fp 
has sandwiching approximations by width 0^(1) DNFs. 

Our derandomization is based on the intuition that the switching lemma is easy to show 
when the number of terms in the original DNF / is small. For instance, let / = V^^^^Tj be a 
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width w DNF. Note that for < p < 1, and p ^ Dp, the probabihty that a single term Tj 
survives the restriction fp (is not set to be a constant) is at most 



In particular if p < the above probability is at most 6/2"". Thus, by linearity of 

expectation, the expected number of terms that survive the restriction is at most 0(1). 
Hence, by Markov's inequality, the restricted DNF fp has very few surviving terms with high 
probability. Further, as we are only using Markov's inequality, the above argument would 
work even if the restriction p is sampled from a distribution where the choices for different 
variables are only /c-wise independent for k = 0{w). 

We use Theorem 1.1 to reduce the case of arbitrary DNFs of small- width to that of DNFs 
with a small number of terms and then use an argument similar to the above. Unfortunately, 
the bound in Theorem 1.1 is not sufficiently strong, so we need to use somewhat stronger 
restrictions where the survival probability is p = w^^ for r > 1. Such a restriction can be 
viewed as a sequence of r rounds of random restrictions, leaving with a. 1/w fraction of live 
variables. We argue that in each round, the width of the formula decreases by 1/2 with high 
probability and then iteratively apply the argument to the new width w/2 formulas. After 
0{logw) rounds, the width reduces to a constant. This corresponds to a random restriction 
where the probability of being alive is exp(— r2(log^ w)). Moreover, this argument works even 
when the random restrictions only have limited independence, yielding Theorem 5.3. 

For k < n, let T)p{k) denote the class of p-regular distributions on 2'"] such that for 
L e Vp{k), Pr[I C L] < 2pl^l for all / C [n], |/| < k. There exist explicit distributions 
V G Vpijx) that can be sampled using 0(/clog(l/p) + log n) -random bits. For instance, one 
can use p'^-almost /c-wise independent ]?- biased variables from [NN93]. 

Claim 5.4. There exists a constant c < 1 such that the following holds for all > 0, 

< s < w and 

V < P{w, s) := —— — 

[w-^ log(l/£:))"' 

For any width w DNF / : {0, 1}" — > {0, 1} and p V E Vpiw), with probability at least 

1 — 5^^^ — e there exist width w/2 DNFs fp : {0, 1}^'' {0, 1} that are e-sandwiching 
approximators for fp . 

Proof of Claim 5.4- Let be width w DNFs with at most h{w) = w^^ {C \og{l / e))'^ 

terms that are e^/2-sandwiching approximators for / as guaranteed by Theorem 1.1 for C 
a large constant. Consider a random restriction p sampled from a distribution in T>p{w/2). 
Then, the probability that a fixed term of has more than w/2 live variables under p is 
at most 2'^ ■ p^l'^ . Therefore, by a union bound, the probability that has width more 
than w/2 is at most h{w)2^p'^l'^ < 5^/^/2 for a sufficiently small constant c. Similarly, the 
probability that f^ has width more than w/2 is at most b^l^ /2. 
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Note that as </</", < /p < We now need to show that are close to 

fp with high probabihty. Let p = p^^x and consider a fixing of the set of five variables L. 
Then as /" are e^/2-sadwiching approximators for /, 

E [Bias(/,)] = Bias(/) 

xe{o,i}W\-^ 

< Bias(/0 + ^ 



E [Bias(/;)] + -. 



Therefore, 
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E [Bias(/,) - Bias(/;)] < -. 
Thus, by Markov's inequality, 

Pr [Bias(/,) - Bias(4) > ^] < ^- 

Using a similar argument to f^, and a union bound, we get that fp is e-sandwiched by 
(/^, /^) with probability at leat 1 - 5"/^ - e. □ 

We now prove Theorem 5.3. 

Proof of Theorem 5.3. Let t be such that u;/2* = s (we ignore the minor technicality of t 
being non-integral) and for r = 1, . . . , t, let pr = piw/T^ s) as defined in the above claim. 
For 2 G [t], let Li be chosen independently from a distribution in Vp.{w/2^). Let L = nj^iLi 
and for x e„ {0, 1}", let p = pl,x- Then, p is a g-regular random restriction with 



t 



=1 



«;3 log(l/£))''°s" («; log(l/e))^^°s" ' 



for C a sufficiently large constant. 

Define the composition of two restrictions p' E {*,0, l}'^ and p" E {*, 0, l}'^^'' •* in the 
natural way by (p' o p")j = p'. if i g L{p') and (p' o p")j = p'. otherwise. Then, by definition, 
we can view p as a composition of independently chosen random restrictions pt o pt^i o ■ ■ ■ op^^, 
where pj = p^.^x^ (with Eu {0, 1}"). Further, for any function g, Qp = (((s'pjpa)-)^*- 

iiijjji .mine Therefore, by iteratively applying the Claim 5.4 t times with the random 
restrictions pi, . . . , pt and a union bound, we get that with probability at least 1 —t{S^^^ + e), 
there exists a lower approximating DNF : {0, l}'^ — {0, 1} of width at most such 
that < fp and Bias(/p) — Bias(/^) < te. Similarly, by iteratively applying the claim to the 
upper approximators given by the claim, we get that with probability at least l — 2t{6^^^ + e), 
fp has (t£:)-sandwiching approximators that are width-s DNFs. 
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Finally, the number of bits needed to sample L is 

t 

r{n, s,e,S) =^0 ■ \og{l/p{w/2'', s)) + logn 

v=l 



= O ((logn)(log^)) + ^ ^ f'^O (log(l/5)) + O (log(i/;log(l/e))) 

v=l ^ 

= O ((logw) ■ (logn + slog(l/5)) + wlog (wlog(l/e))) . 

The theorem now follows from applying the above argument to 6' = S/2t, e' = e/t and 
noting that this only changes the constant terms in the final bounds. □ 



6 Open Problems 

A natural open question is to show optimal bounds for DNF sparsification. We believe 
this question is interesting of its own right, even without the sandwiching requirement. 
Formally, let m{w,e) be the smallest integer such that every width-w DNF formula can 
be ^-approximated by a width-w; DNF with m terms. Theorem 1.1 shows that m{w,e) < 
(w log(l/£:)*^^"'). Rocco Servedio [Serll] observed that the Majority function on 2w variables 
(which is a width- DNF) shows that m{w^e) > 4"'-°('") for any constant e. We are unaware 
of a better lower bound, and it is conceivable that the right bound is exponential in w. We 
pose this conjecture: 

Conjecture 6.1. (Weaker Version) There exists a function c{e) such that 

m{w,e) < 0{c{eY). 

(Stronger Version) There exists a constant c such that 

m{w,e) < 0(log(l/e)""'). 

The weaker version, if true, will imply that log(?7,) width DNFs can be e-approxmiated 
by n'^'"''^^ size DNFs for any constant e. Currently Theorem 1.1 gives the weaker bound of 

m{\og{n),e) < ^o{iogiog(n)iogiog(i/e))_ 
The stronger version, if true, will strengthen Freidgut's theorem in the context of DNFs. 

Mansour's Conjecture. Conjecture 6.1 is similar in spirit to Mansour's conjecture which 
also asserts that DNF formulas admit concise representations, but in the Fourier domain. It 
also implies reductions between the conjecture for small width DNFs and small-size DNFs. 

We say that / : {0, 1}" — > {0, 1} has a t-sparse e-approximation if there exists p : 
{0, 1}" — M with at most t non-zero Fourier coefficients such that 

Pr \(f(x) -p(x)f] < e. 
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Conjecture 6.2. (Mansour's Conjecture for size) [Man94] 

(Weaker version) There exists a function c{e) such that every DNF of size m has an 
m^^^^ -sparse e- approximation. 

(Stronger version) Every DNF of size m has an mP^^°^^^/^^'> -sparse e- approximation. 

Mansour originally stated the stronger version of the conjecture, the weaker version 
appears in [0'D12]. The following analogue of Mansour's conjecture for small width suggests 
itself. To our knowledge, this conjecture has not appeared explicitly in the literature. 

Conjecture 6.3. (Mansour's conjecture for width) 

(Weaker version) There exists a function c{e) such that every DNF of width w has an 
2^^^^"^ -sparse e- approximation. 

(Stronger version) Every DNF of width w has an 2'^'^^^°^'^^^'^^^ -sparse e -approximation. 

The best known bounds for both size and width are due to Mansour, who shows that 
every DNF of width w has an ty'^^"'^"^''^/^-' •'-sparse e- approximation and then derives a bound 
for size using w = 0(\og{m/e) [Man95]. 

We feel that this width analogue of Mansour's Conjecture is natural; indeed most results 
on DNFs proceed by first tackling the width-ty case, and then translating it to DNFs of size 
m using w < log{m/e) [Has86, LMN93, Man95]. This substitution also shows that 

• The weaker version of Mansour's Conjecture for width implies the weaker version of 
Mansour's Conjecture for size. 

• The stronger version of Mansour's Conjecture for width implies the stronger version of 
Conjecture for size, as long as e > l/poly(m). 

Conjecture 6.1 implies the reverse equivalence. 

Lemma 6.4. • Aussme the stronger version of Conjecture 6.1. Then the stronger ver- 
sion of Mansour's Conjecture for size implies that every width w DNF formula has a 
20(i«iog(i/e) iogiog(i/e))_g^^j,gg s- approximation. 

• Assume the weaker version of Conjecture 6.1. Then the weaker version of Mansour's 
Conjecture for size implies the weaker version of Mansour's Conjecture for width. 

Note that if we replace Conjecture 6.1 with Theorem 1.1, this does not improve on the 
bound from [Man95]. So in this context, the improved dependence on w in Conjecture 6.1 
is crucial. 

Sparsification using the Greedy Algorithm. A natural approach to sparsifying a DNF 
formula / is to view it as a set-covering problem, where we wish to cover /~^(1) C {0, 1}" by 
width w terms. One could use the greedy algorithm in the hope that it constructs a sparse 
cover. It woule be interesting to analyze its performance. In this direction, Jan Vondrak 
has pointed out that one can use the analysis of greedy set cover to argue that if there is a 
lower sandwiching DNF formula of size m^{w,e) which is 5-close to /, then greedy returns a 
2e approximation of size at most m^{w^e) ln(l/£:) [Vonl2]. 
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Deterministic DNF counting. The question of finding a deterministic polynomial time al- 
gorithm for approximate DNF counting remains open. One approach towards this goal would 
be to construct pseudorandom generators for DNFs formulas with seed- length 0(log(n) -|- 
log(m) + log(l/£)). Such constructions are currently not known even for read-once DNFs. 
A recent result by the Trevisan, Vadhan and the authors gets a seed-length of 0(log(n) -|- 
log(l/£:)) in the read-once case [GMR+12]. 
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A Proofs from Section 2 

We first show that Lemma 2.3 is equivalent to Lemma A. 2 below from [RoslO]. 

Definition A.l ( [RoslO]). Let T he a family of sets over a universe U and let Y = (iTeJ^S. 
Call T a '-f- sunflower if for a random set W U, with each element of U present in W 
independently with probability 1/2, 

Pr[3T G J", (T \ F) n 1^ = 0] > 1 - 7. 

Lemma A. 2 ( [RoslO]). Let T be a family of sets over a universe U each of size at most 
w. If \ J^\ > w\ ■ (2.471og(l/7))'", then T contains a '-f- sunflower. 

Proof of Lemma 2. 3. As / is unate, without loss of generality suppose that / is monotone. 
Let U = [n] and J-' = {Tj : 1 < i < m}. By the above lemma, there exists a 7-sunfiower 
^' = {T,„...,T,Jfor 

7 = /.(-/-')^^^ where /. = 
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We claim that the lemma holds for the terms in T' and Y = Clj^iTi.. Let x {0, 1}" 
and let W = {i : Xi = 0}. Then, each element of U is present in W independently with 
probability 1/2. Therefore, as J-"' is a 7-sunfiower 

Pr[V^^=i(T,^. \ 1^) = 1] = Pr[3T G (T \ F) n = ] > 1 - 7. 

□ 

We next show Fact 2.8. 

Proof of Fact 2.8. From the definition of 7( ) from Equation 2.2, it is easy to check that 
7(j/2'") > j^/"'/10u7. We shall also use the following inequality that follows from partial 
integration: for any ^ > A; > 0, 

I x^e-''dx = ^(^\{i\)-{d^-'e-'^)<{k + l)9^-e-^. (A.l) 



Therefore, for 6 = W^/'^/lO 
00 00 

j=W+l j=W+l 



(ji/'"/10io) 



00 

'w 



y""-^ ■ e-^dy (substituting y = x^/^'/lOw) 

< lOw"^ ■ (lOw)'"-^ ■ w ■ e'"-^e-^ (by Equation A.l) 

< lOw^ ■ ly ■exp(-10u;^log(l/£)) 

= exp (log(lOw^) + wlog2 + 3wlogw + wlog(50 log(l/£:)) — lOw;^ log(l/£:)) 

< exp(-log(l/£)) = e 

where the last inequality can be checked numerically for w > 1 and 5 < 1/4. □ 



B Proofs from Section 4 

In this section, we prove the two Lemmas from [LV96] that are used in our analysis. We 
restate them here for the reader's convenience. 

Lemma B.l. (Lemma 4-2 Restated) We have 

\/h e Bias(A) < Bias(/), 
E [Bias(A)] > Bias(A) - e. 
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Proof. As fh is obtained by dropping terms in /, we have fhix) < f{x) Vx G {0, 1}", so 
Bias(//i) < Bias(/). This also imphes that 



Bias(A) 

Taking expectation over h, we have 



Pr[Bias(A)] 

hey. 



(B.l) 



x-e/-i(i) 



hew 



(B.2) 



Fix an z G / ^(1) and a term Tj of / that it satisfies. If Tj is included in fh, which 
happens unless Tj is bad for h, then fh{x) = 1. By Lemma 4.1 and a union bound, 



e w 



Hence we have 



Pr \Ti is bad for /i] < t ■ 2"'^ < - ■ - < e. 
hen' w k 



Pr[A(x)]>l-£. 

hen 



Plugging this into Equation (B.2) gives 

Pr [Bias(A)l > 1 f J: (1-.)] 



Lemma B.2. (Lemma 4-3 restated) We have 



'1-e 



;i-£)Bias(/). 



□ 



\Ph - Bias(A)| < e. 

Proof. Let be the uniform distribution over {0, 1}"^. For j G [t], let Vj be the distribution 
obtained from 'Dj_i by replacing the uniform distirbution on variables in bucket Bj with an 
independent copy of output of the generator G. Thus Vf is the output distribution of Gh- 
We claim that for j G [t], 



xeT> 



Pr [fh{x) = 1] - Pr = 1] 



'i-i 



xeVj 



< 6. 



(B.3) 



Since 'Dj_i and Vj differ only on the distribution over bucket Bj, we first sample assignments 
for the other buckets. The resulting formula on the variables in Bj is a DNF with width at 
most w'. Hence it is (5-fooled by G, which gives Equation (B.3). 
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We now have 



\B\as{fh) -Ph\ 



Pr [Mx)] - Pr IMx)] 



< t6 

< e. 



Pr [h{x)] - Pr [h{x)] 



□ 
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